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It is not uncommon for certain social networks to divide into two opposing camps in response to 
stress. This happens, for example, in networks of political parties during winner-takes-all elections, 
in networks of companies competing to establish technical standards, and in networks of nations 
faced with mounting threats of war. A simple model for these two-sided separations is the dynamical 
system dX/dt = X 2 where X is a matrix of the friendliness or unfriendliness between pairs of nodes 
in the network. Previous simulations suggested that only two types of behavior were possible for 
this system: either all relationships become friendly, or two hostile factions emerge. Here we prove 
that for generic initial conditions, these are indeed the only possible outcomes. Our analysis yields a 
closed-form expression for faction membership as a function of the initial conditions, and implies that 
the initial amount of friendliness in large social networks (started from random initial conditions) 
determines whether they will end up in intractable conflict or global harmony. 



INTRODUCTION 

The mathematical model that we want to study is best 
understood as an outgrowth of a theory from social psy- 
chology known as structural balance [1]. So let's begin 
with a brief explanation of what this theory says. 

Consider three individuals: Anna, Bill and Carl, and 
suppose that Bill and Carl are friends with Anna, but are 
unfriendly with each other. If the sentiment in the rela- 
tionships is strong enough, Bill may try to strengthen his 
friendship with Anna by encouraging her to turn against 
Carl, and Carl might likewise try to convince Anna to 
terminate her friendship with Bill. Anna, for her own 
part, may try to bring Bill and Carl together so they can 
reconcile and become friends. In abstract terms, rela- 
tionship triangles containing exactly two friendships are 
prone to transition to triangles with either one or three 
friendships. 

Alternately, suppose that Anna, Bill and Carl all view 
each other as rivals. In many such situations, there are 
incentives for the two people in the weakest rivalry to co- 
operate and form a working friendship or alliance against 
the third. In these cases, a single friendship may be 
prone to appear in a relationship triangle that initially 
has none. 

These two thought experiments suggest a notion of sta- 
bility, or balance, that can be traced back to the work of 
Heidcr [2]. Hcider's theory was expanded into a graph- 
theoretic framework by Cartwright and Harary [3], who 
considered graphs on n nodes (representing people, coun- 
tries or corporations) with edges signed either positive 
(+) to denote friendship or negative (— ) to denote rivalry. 
If a social network feels the proper social stresses (those 
felt by Anna, Bill and Carl in the examples above) , then 
Cartwright and Harary's theory predicts that in steady 
state the triangles in the graph should contain an odd 
number of positive edges — in other words, three positive 
edges or one positive edge and two negative edges. We 



refer to such triangles as balanced, and triangles with an 
even number of positive edges as unbalanced. Finally, we 
call a graph complete if it contains edges between all pairs 
of nodes, and we say that a complete graph with signs 
on its edges is balanced if all its triangles are balanced. 
(All graphs in our discussion will be complete.) 

As it turns out, these local notions of balance theory 
are closely related to the global structure of two opposing 
factions. In particular, suppose that the nodes of a com- 
plete graph are partitioned into two factions such that 
all edges inside each faction arc positive and all edges 
between nodes in opposite factions are negative. (One 
of these factions may be empty, in which case the other 
faction includes all the nodes in the graph, and conse- 
quently all edges of the network are positive.) Note that 
this network must be balanced, since each triangle either 
has all three members in the same faction (yielding three 
positive edges) or has two members in one faction and 
the third member in the other faction (yielding one pos- 
itive edge and two negative ones). In fact, a stronger 
and less obvious statement is true: any balanced graph 
can be partitioned into two factions in this way, with one 
faction possibly empty [3]. As a result, when we speak of 
balanced graphs, we can equivalently speak of networks 
with this type of two-faction structure. 



MODEL 

Structural balance is a static theory — it posits what 
a "stable" signing of a social network should look like. 
However its underlying motivation is dynamic, based on 
how unbalanced triangles ought to resolve to balanced 
ones. This situation has led naturally to a search for 
a full dynamic theory of structural balance. Yet find- 
ing systems that reliably guide networks to balance has 
proved a challenge in itself. 

A first exploration of this issue was conducted by Antal 
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ct al. [4] who considered a family of discrete-time models. 
In one of the main models of this family, an edge of the 
graph is examined in each time step, and its sign is flipped 
if this produces more balanced triangles than unbalanced 
ones. While a balanced graph is a stable point for these 
discrete dynamics, it turns out that many unbalanced 
graphs called jammed states are as well [4, 5]. 

Thus, the natural problem became to identify and rig- 
orously analyze a simple system that could progress to 
balanced graphs from generic initial configurations. A 
novel approach to this problem was taken by Kulakowski, 
Gawrohski, and Gronek [6], who proposed a continuous- 
time model for structural balance. They represented the 
state of a completely connected social network using a 
real symmetric nxn matrix X whose entry Xij represents 
the strength of the friendliness or unfriendliness between 
nodes i and j (a positive value denotes a friendly relation- 
ship and a negative value an unfriendly one). Note that 
for a given X, there is a signed complete graph with edge 
signs equal to the signs of the corresponding elements x^ 
in X. We will call X balanced if this associated signed 
complete graph is balanced. 

Kulakowski ct al. considered variations on the follow- 
ing basic differential equation, which they proposed as 
a dynamical system governing the evolution of the rela- 
tionships over time: 



Remarkably, simulations showed that for essentially any 
initial ^(0), the system reached a balanced pattern of 
edge signs in finite time. 

Writing Eq. 1 directly in terms of the entries Xij gives 
a sense for why this differential equation should promote 
balance: 

^-=^x ik x kj . (2) 
fc 

Notice that x^ is being pushed in a positive or negative 
direction based on the relationships that i and j have 
with fc: if Xik and Xkj have the same sign, their product 
guides the value of x^ in the positive direction, while if 
Xik and Xkj have opposite signs, their product guides the 
value of x^ in the negative direction. In each case, this 
is the direction required to balance the triangle fc}. 
Note also that Eq. 2 applies for the case that i = j. While 
this case is harder to interpret, the monotonic increase 
of xu implied by Eq. 2 might be viewed in psychological 
terms as an increase of self-approval or self-confidence as 
i becomes more resolute in its opinions about others in 
the network. 

For a network with just three nodes, it can be eas- 
ily proved that a variant of these dynamics generically 



balances the single triangle in this network; such a three- 
node analysis has been given by Kulakowski ct al. [6], 
and we describe a short proof in the Supporting Informa- 
tion. What is much less clear, however, is how the system 
should behave with a larger number of nodes, when the 
effects governing any one edge {i,j} are summed over all 
nodes fc to produce a single aggregate effect on x^ . 

It has therefore been an open problem to prove that 
Eq. 2 or any of the related systems studied by Kulakowski 
et al. will bring a generic initial matrix X(0) to a bal- 
anced state. It has also been an open problem to charac- 
terize the structure of the balanced state that arises as a 
function of the starting state -^(0). 

RESULTS 

In this paper, we resolve these two open problems. 
We first show that for a random initial matrix (drawn 
from any absolutely continuous distribution) , the system 
reaches a balanced matrix in finite time with a probabil- 
ity converging to 1 in the number of nodes n. In addition, 
we provide a closed-form expression for this balanced ma- 
trix in terms of the initial one; essentially, we discover 
that the system of differential equations serves to "col- 
lapse" the starting matrix to a nearby rank-one matrix. 
We also characterize additional aspects of the process, 
giving for example a description of an "exceptional" set 
of matrices of probability measure converging to in n 
for which the dynamics are not necessarily guaranteed to 
produce a balanced state. 

We then analyze the solutions of the system for classes 
of random matrices in the large-n limit — in particular, 
we consider the case in which each unique matrix entry 
is drawn independently from a distribution with bounded 
support that is symmetric about a number [i (the mean 
value of the initial friendliness among the nodes) . In this 
case, we find a transition in the solution as [i varies: when 
fj, > 0, the system evolves to an all-positive sign pattern, 
whereas when /i < 0, the system evolves to a state in 
which the network is divided evenly into two all-positive 
cliques connected entirely by negative edges. We end by 
discussing some implications of the model and the associ- 
ated transition between harmony and conflict, including 
an evaluation of the model on empirical data and some 
potential connections to research on reconciliation in so- 
cial psychology. 

Behavior of Model: Evolution to a Balanced State 

Suppose we randomly select the (0)'s from a contin- 
uous distribution on the real line. Then the x^ (t)'s found 
by numerical integration generally sort themselves in fi- 
nite time into the sign pattern of two feuding factions. To 
reformulate this observation as a precise statement and 
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explain why the behavior holds so pervasively, we now 
solve Eq. 1 explicitly. 

Solution to model. The initial matrix X(0) is real 
and symmetric by assumption, so we can write it as 
QD(0)Q T where D(0) is the diagonal matrix with the 
eigenvalues of X(0), denoted Ai > A 2 > • • ■ > A„, as 
diagonal entries ordered from largest to smallest, and Q 
is the orthogonal matrix with the corresponding eigen- 
vectors of X(0), denoted w\, o->2, ■ • ■ , w n , as columns. The 
superscript T signifies transposition. 

The differential equation Eq. 1 is a special case of a 
general family of equations known as matrix Riccati equa- 
tions [7]. The analysis of the full family is complicated 
and not fully resolved, but we now show that the special 
case of concern to us, Eq. 1, has an explicit solution with 
a form that exposes its connections to structural bal- 
ance. We proceed as follows. First, we observe that by 
separation of variables, the solution of the single- variable 
differential equation x — x 2 (overdot representing differ- 
entiation by time) with initial condition x(0) = A& is 



Therefore the diagonal matrix D(t) = diag(^i(£), 
£2 (t),..., £ n (t)) is the solution of Eq. 1 for the initial 
condition X(0) = diag(Ai, A2, . . . , A n ). 

Moreover Y(t) = QD(t)Q T is also a solution of Eq. 1 
since Y = QDQ T = Q(D 2 )Q T = (QDQ T ) 2 = Y 2 . But 
Y(t) has the same initial condition as X(t) in our original 
problem: Y(0) = QD(0)Q T = X{0). So by uniqueness, 
Y(t) = QD{t)Q T must be the solution we seek. 

Our solution X (t) can also be written in a different way 
to mimic the solution of the one-dimensional equation 
x = x 2 . Since Xij(t) = Ylk=i Qikh{t)Qjk, where q i3 is 
the (i,j)th entry of Q, we can expand the denominators 
of the tk{t) functions in powers of t to rewrite X(t) as 
X(0) + X{0) 2 t + X(0) 3 t 2 H , or more concisely, 

X(t) = X(0)[I-tX(0)}-\ (4) 

(Note that the matrices X(0) and [/ - X(0)i] _1 com- 
mute.) This equation is valid when t is less than the 
radius of convergence of every Afc, that is when t < 1/Ai 
(assuming Ai > 0). 

Finally we note that the above method of solving 
Eq. 1 contains a reduction of the number of dynami- 
cal variables of the system from ("J 1 ) to n. The (™) 
constants of motion generated by this reduction are 
just the off-diagonal elements of Q T X(t)Q = D(t), or 
J2k=i £™=i QkiXki{t)q£j = for all 1 < i < j < n. Fur- 
thermore, the procedure for reducing X{t) can be easily 
generalized to any system of the form X = f(X) where 
/ is a polynomial of X. 



Behavior of solution. Let's now examine the behav- 
ior of our solution X(t) to see why in the typical case it 
splits into two factions in finite time. It turns out that 
this is the guaranteed outcome if the following three con- 
ditions hold (and as we will see below, they hold with 
probability converging to 1 as n goes to infinity) : 

1. Ai > 0, 

2. Ai 7^ A2 (and hence Ai > A2), and 

3. all components of u>i are nonzero. 

To see why these conditions imply a split into two fac- 
tions, observe from Eq. 3 that each £k(t) diverges to in- 
finity at t = 1/Afc. Since = Y!,k=iQikh(t)Qjk, all 
Xij's diverge to infinity when the £k with the smallest 
positive 1/Afc does. Under the first and second condi- 
tions, this £k is £\ , so the blow-up time t* of Eq. 1 must 
be 1/Ai. To show that the nodes are partitioned into two 
factions as X(t) approaches t* , let X(t) = X(t)/\\X(t)\\ 
on the half-open interval [0,i*), where ||X(i)|| denotes 
the Frobcnius norm of X . The matrix X(t) has the sign 
pattern of X(t), and as t approaches t* it converges to 
the rank-one matrix 

X* = Qdiag(l,0,0,...,0)Q T =wiwf (5) 

Now let ui\k denote the value of the fcth coordinate of 
wi, and let S — {k : w lk > 0} and T = {k : uji k < 0}. 
Then S and T partition the node indices 1, 2, . . . , n by our 
condition that o>i has no zero components. From Eq. 5, 
this partition must correspond to two cliques of friends 
joined by a complete bipartite graph of unfriendly ties. 

The three conditions. We now return to the three 
conditions above. We first show that the second and third 
hold with probability 1. We then show that the first con- 
dition holds with probability converging to 1 as n goes to 
infinity. Lastly, we analyze the behavior of the system in 
the unlikely event that the first condition does not hold. 
The fact that the conjunction of all three conditions holds 
with probability converging to 1 as n grows large justifies 
our earlier claim that the behavior described above holds 
for almost all choices of initial conditions. 

First we show why the second and third conditions 
hold with probability 1 so long as the (joint) distribution 
from which X(0) is drawn is absolutely continuous with 
respect to Lebcsgue measure — in other words, assigns 
probability zero to any set of matrices whose Lebcsgue 
measure is zero. Our arguments below make use of the 
following two basic facts: 

i. the set of zeros of a nontrivial multivariate polyno- 
mial has Lebesgue measure zero, and 

ii. the existence of a common root of two univariate 
polynomials P and Q is equivalent to the vanishing of 
a multivariate polynomial in the coefficients of P and 
Q (specifically, it is equivalent to the vanishing of the 
determinant of the Sylvester matrix of P and Q, also 
called the resultant of P and Q). 
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To show that Ai 7^ A2 with probability 1, let P denote 
the characteristic polynomial of X(0), and let Q denote 
the derivative of P. Then X(0) has a repeated eigenvalue 
if and only if P has a repeated root, which it does if and 
only if P and Q have a common root. This condition 
is equivalent to the vanishing of the resultant of P and 
Q, which is a multivariate polynomial in the entries of 
X(0). The polynomial cannot be zero everywhere, be- 
cause there is at least one symmetric matrix that does 
not have a repeated eigenvalue. So the set of matrices 
having a repeated eigenvalue has Lebesgue measure zero. 

Similarly, to show that all components of u)\ are 
nonzero, let P denote the characteristic polynomial of 
X(0) and Pi the characteristic polynomial of the (n — 
1) x (n — 1) submatrix Xi(0) obtained by deleting the ith 
row and ith column of X(Q). It is easy to check that if 
any eigenvector of X(0) has a zero in its ith component, 
then the vector obtained by deleting that component is 
an eigenvector of Xi(0) with the same eigenvalue. Con- 
sequently, P and Pi must have a common root, implying 
that the resultant of P and Pi vanishes. This resultant 
is once again a multivariate polynomial in the entries 
of -^(0), and once again it must be nonzero somewhere 
because there is at least one symmetric matrix whose 
eigenvectors all have nonzero entries. Hence, the set of 
matrices having an eigenvector with zero in its ith com- 
ponent has Lebesgue measure zero. 

Finally, to determine the likelihood of the first condi- 
tion, we first must say a bit more about the way that 
X(0) is selected. Suppose that the off-diagonal Xij (0)'s 
are drawn randomly from a common distribution F and 
the on-diagonal 3^(0) 's are drawn randomly from a com- 
mon distribution G. All selections are independent for 
i < j- (For i > j, we let ajy(0) = Xji(0), so that X(Q) 
is symmetric.) For this construction of -^(0), Arnold [9] 
has shown that with the remarkably weak additional as- 
sumption that F has a finite second moment, Wigner's 
semicircle law holds in probability as n grows to infinity. 
This in turn implies that Ai > in probability in the 
same limit. 

Moreover, suppose we are in the low-probability case 
that Ai < 0. In this case, the analysis above shows that 
all the functions ii{t) converge to as t — s- 00. Thus, 
limt_j. 00 D(t) = 0, and since X(t) = QD(t)Q T , we also 
have lim^oo X(t) = 0. 

Although the entries of X{t) converge to zero when 
Ai < 0, one might still want to know if the sign pattern 
of X(t) is eventually constant (i.e., remains unchanged 
for all t above some threshold value) and, if so, what 
determines this sign pattern. It is possible to answer 
this question, again assuming the second and third con- 
ditions. By expanding the function £i(t) = Aj/(1 — Ajt) 
in powers of u = 1/t, we obtain the asymptotic series 

£ t (t) = -u-u'Xr 1 -0(u 3 ), (6) 



which implies 

X(t) = QD(t)Q T = -ul - i^Xioy 1 - 0(u 3 ). (7) 

In the limit of small it, the leading order term of the 
diagonal entries of X{t) is the linear term, which has 
negative sign. For the off-diagonal entries of X(t), the 
leading-order term as u tends to zero is the quadratic 
term, whose sign matches the sign of the corresponding 
off-diagonal entry of the matrix — A(0) _1 . 

Behavior of Model: From Factions to Unification 

The analysis in the previous section tells us how to find 
both the blow-up time t* and final sign configuration of 
a network if we know its initial state X(0). However we 
might also want to know whether we can characterize the 
behavior of X(t) in the large- n limit in terms of statistical 
parameters of X(0). This could, for example, help us 
forecast the behavior of large populations when collecting 
complete relationship-level data is not feasible. 

In this section, we show that there is a transition from 
final states consisting of two factions to final states con- 
sisting of all positive relations as the "mean friendliness" 
of X(0) (the mean of the distributions used to gener- 
ate the off-diagonal entries of X(0)) is increased from 
negative to positive values. This is consistent with the 
numerical simulations shown in Fig. 1. 

Before discussing the details though, we should de- 
scribe how X(0) is selected in this section. We start by 
adopting the procedure of Fiircdi and Komlos [8] : the ele- 
ments Xij (0) are drawn independently from distributions 
Fij with zero mass outside of [— K, K]. The off-diagonal 
Fifs have a common expectation fi and finite variance 
c , while the on-diagonal Fa's have a common expec- 
tation v and variance r 2 . In addition, we require that 
each off-diagonal distribution F^ be symmetric about \i. 
Now let's consider the three cases of positive, zero and 
negative //. 

Case 1: fi > 0. The results of Fiircdi and Komlos [8] 
show that when fj, > 0, the deviation of u>\ from 
(1, 1, . . . , l)/y/n vanishes in probability in the large-n 
limit. Hence the final state of the system consists of 
one large clique of friends containing all but at most a 
vanishing fraction of the nodes. Moreover, by assuming 
a bound on a we can strengthen this statement further: 
if a < /i/2, then the findings of Fiircdi and Komlos imply 
that the final state consists of a single clique of friends, 
with no negative edges. These observations are consistent 
with the representative numerical trial shown in Fig. 1A. 
Moreover, Fiiredi and Komlos show that the asymptotic 
behavior of Ai grows like /j,n + 0(l), and hence the blow- 
up time scales like l/(/m). 

We can gain insight into the behavior of the system 
for small t using an informal Taylor series calculation: 
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FIG. 1: Representative large-n plots of the model for (A) u > (fi — 3/10 in the plot shown), (B) /j, — 0, and (C) fi < (/i = —3 
in the plot shown). For all three plots, a — 1 and n = 90. To reduce image complexity, only one randomly sampled fifth of the 
trajectories is included. In the second plot, t* denotes the time at which the system diverges, and e denotes a sufficiently small 
displacement. The white curves superimposed on the three plots are the large-n trajectories Xij(t) = Xij(Q) — /x + /i/(l — finct) 
for Xij(0) = /i, /i ± 3<j/2, where c represents a rescaling of time. Since we want to fix the blow-up time t* near 1 and since 
ct* = 1/Ai as found in the text, we choose c = 1 / (fJ-n + v — fj, + a' 2 / fx) for (A) and c = l/(2ay/n) for (B) and (C) using estimates 
of Ai taken from Ref. [8]. The black dotted lines mark the blow-up times t* = l/(cAi). 



if we rescale time in Eq. 1 by inserting a 1/n before the 
summation, compute the Taylor expansion of Xij (t) term- 
by-term and then take the expectation of each term, we 
obtain the geometric series x(t) = fx + fx 2 t + /i 3 i 2 + • • • , 
or 



With significantly more work, it can be proved that every 
trajectory Xij(t) has this time dependence on [0, l/if) in 
the large-n limit with probability 1 (see the Supporting 
Information), so we may write 

lim XijCt) = £j,-(0) — £t H ~ — with prob. 1 (9) 

n->oo J J 1 — fit 

for all t in [0, 1/K). Observe that this limit has a blow-up 
time t* of 1//K. Since our rescaling of time represents a 
zooming in or magnification of time by a factor of n, this 
t* corresponds to a blow-up time asymptotic to 1/ (fxn) 
for the unrescaled system, consistent with the results of 
Fiiredi and Komlos. 

Case 2: fi = 0. In the event that the network starts 
from a mean friendliness of zero, numerical experiments 
indicate that the system ends up with two factions of 
equal size in the large-n limit (Fig. IB). We now prove 
this to be the case. For the remainder of this discussion, 
we will abbreviate X(0) as A and Xij(0) as CLij • 

Since the off-diagonal entries of A have symmetric dis- 
tributions by assumption, we have for any off-diagonal 
dij and any interval Sij on the real line that P{(iij € 



Sij) = P(—a,ij G S^). Now let D be a diagonal ma- 
trix with some sequence of +1 and —1 along its diagonal 
(where the ith diagonal entry is denoted by di). Then 
the random matrices A and B = DAD are identically 
distributed, as we will now show. 

To say that A and B are identically distributed means 
that for every Borel set of matrices S, P(A € S) = P{B £ 
S). To prove this, it suffices to consider the case in which 
S is a product of intervals Sij, since these product sets 
generate the Borel sigma-algcbra. The entries of A are 
independent, so P(A € S) = n i < :) P(ay G Sij)- Simi- 
larly, P{B e S) = II, ,/'ir7,</, ,'/, <G S^). By the symme- 
try of the off-diagonal distributions, Hi<jP(aij £ Sij) = 
II,. a,,!!, g Sij), which gives us P(A g S) = P(B g 
S) as desired. (Note that when i = j, the factor didj is 1 
so the on-diagonal distributions need not be symmetric.) 

Now consider the set S of matrices with an uj\ consist- 
ing of all positive components. The above demonstra- 
tion implies that the probability of choosing an A in this 
set is the same as choosing an A such that B is in this 
set. Regarding the later event, A(Duji) = A^-Dw,) im- 
plies Buji = XiUJi, so the Ai eigenvector of the A used 
to compute B is Du)\. This demonstrates that all sign 
patterns for the components of u>i are equally likely. In 
other words, the distribution of the number of positive 
components in oj\ is the binomial distribution B(n, 1/2) 
and the fraction of positive components in u>i converges 
(in several senses) to 1/2 as n grows large. 

Additionally, we can consider how Ai varies with n in 
the case that [i = to determine when the blow-up will 
occur. Fiiredi and Komlos [8] found for this case that 
Ai g 2(7-^71 + 0(n 1 / 3 logn) with probability tending to 1, 



6 



so with probability tending to 1 the blow-up time shrinks 
to zero like 1/y/n, an order of \fn slower than in the \i > 
case. 

Case 3: \i < 0. For this final case, Fiiredi and 
Komlos [8] found that Ai < lo^fn + (^(n 1 / 3 logra) with 
probability tending to 1 . The semicircle law gives a lower 
bound: Ai > la^pn + o(y/n) in probability. So the blow- 
up time goes to zero like l/\/n in the unrescaled system. 

Note also that if we define a new matrix C = —A where 
A is now the initial matrix X(0) of Case 3, then C sat- 
isfies the condition of Case 1, n > 0. Thus the distance 
between the top eigenvector of C and (1, 1, . . . , l)/yfn 
declines to zero in probability just as in Case 1. Further- 
more, every other eigenvector of C is orthogonal to the 
largest one. Hence if a < |/i|/2, then with probability 
tending to 1, every other eigenvector acquires a mixture 
of positive and negative components in the large- n limit, 
including the bottom eigenvector of C , which is the top 
eigenvector of A. This establishes that in the case that 
(i < and cr < |/ti|/2, the system ends up in a state with 
two factions with probability converging to 1 for all finite 
n. 

Numerical simulations of the case that \i < suggest 
the conjecture that the two factions are approximately 
equal in size for large n. Furthermore, the derivation 
of Eq. 9 is in fact valid for all fi, so each trajectory 
rapidly decays from xy(0) toward Xij(0) — /i on [0, 1/K) 
(Fig. 1C). This transient decay appears to extend be- 
yond t = l/K in numerical simulations. So, for example, 
if time is rescaled by 1/ ' y/n instead of 1/n, we would hy- 
pothesize that (i) each trajectory makes a complete jump 
from Xij(0) to Xij(0) — \i in the large-n limit, and that 
(ii) from this point onward, the system behaves like an 
initial configuration of the [i = case and so separates 
into two equal factions en route to its blow-up at l/(2er). 



DISCUSSION 

In this final section, we review our results and their sig- 
nificance relative to previous work in structural balance 
theory. We then compare the predictions of the model 
with data, discuss potential criticisms of the model, and 
finish with some intriguing connections between the be- 
havior of the model and recent social-psychological work 
on neutralizing two-sided conflicts. 

Our first result is a demonstration that the model 
forms two factions in finite time across a broad set of 
initial conditions. As noted at the outset, similar demon- 
strations have not been possible for dynamic models of 
structural balance in earlier literature because these mod- 
els contained so-called jammed states that could trap a 
social network before it reached a two-faction configura- 
tion [4, 5]. The model of Kulakowski ct al. by contrast 
has no such jammed states for generic initial conditions 



and hence provides a robust means for a social network 
to balance itself. 

The second result of the paper is the discovery and 
characterization of a transition from global polarization 
to global harmony as the initial mean friendliness of the 
network crosses from nonpositivc to positive values. Sim- 
ilar transitions have been observed in other models of 
structural balance but so far none has been character- 
ized at a quantitative level. For example, Antal et al. [4] 
found a nonlinear transition from two cliques of equal 
size to a single unified clique as the fraction of positively 
signed edges at t = was increased from to 1 (see Fig. 
5 of Rcf. [4]). The authors provided a qualitative argu- 
ment for this transition, but left open the problem of its 
quantitative detail. Our results both confirm the gen- 
erality of their observations and provide a quantitative 
account of a transition analogous to theirs. 

To complement the theoretical nature of our work and 
get a better sense of how the model behaves in practice, 
we can numerically integrate it for several cases of empir- 
ical social network data where the real-life outcomes of 
the time-evolution are known. Our first example is based 
on a study by Zachary [10] who witnessed the break-up 
of a karate club into two smaller clubs. Prior to the sep- 
aration, Zachary collected counts of the number of social 
contexts in which each pair of individuals interacted out- 
side of the karate club, with the idea being that the more 
social contexts they shared, the greater the likelihood 
for information exchange. These counts, or capacities as 
Zachary called them, can be converted to estimates of 
friendliness and rivalry in many different ways. For a 
large class of such conversions, Eq. 1 predicts the same 
division that Zachary's method found, which misclassi- 
fied only 1 of the 34 club members (Fig. 2A,B). 

A second example can be constructed from the data of 
a study by Axelrod and Bennett [11] regarding the aggre- 
gation of Allied and Axis powers during World War II. If 
we simply take the entries of their propensity(i, j)-size(i)- 
size(j) matrix to be proportional to the friendliness felt 
between the various pairs of countries in the war, then 
running the model gives the correct Allied- Axis split for 
all countries except Denmark and Portugal (Fig. 2C). 

Despite these modest successes, the model could still 
be criticized as "a simplification and an idealization, and 
consequently a falsification" [12]. Clearly, human behav- 
ior is more complicated than what is captured by Eq. 1. 
However, deliberate simplicity is a common feature of 
many foundational mathematical models of basic social 
phenomena, which are often designed to isolate and study 
the effect of a single social force. Such models can be par- 
ticularly appropriate in extreme settings where this single 
force plays a dominant role, making human choices more 
constrained and thus perhaps more predictable. In the 
present case, the Kulakowski et al. model is designed 
to ignore all other social behaviors besides the urge to 
make one's friendships and rivalries consistent. In this 
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time 

FIG. 2: Tests of the model of Kulakowski et al. (Eq. 1) against two existing data sets. (A) The evolution of the model 
starting from Zachary's capacity matrix with the capacity of each relationship reduced by 0.58. This is the minimal downward 
displacement necessary (to two significant figures) for the resulting separation to be correct for all but 1 of the 34 club members. 
For reasons described by Zachary [10], this is basically the best separation we can expect. (B) The evolution of the model 
from Zachary's capacity matrix with the capacity of zero between the two club leaders replaced by —11; the resulting factions 
are identical to those in (A). Substituted values less than —11 yield the same two factions, while greater values produce less 
accurate divisions. (C) The evolution of the model starting from Axelrod and Bennett's 1939 propensity(i, j) ■ size(i) ■ size(j) 
matrix for the 17 countries involved in World War II (by Axelrod and Bennett's definition). The model finds the correct split 
into Allied and Axis powers with the exceptions of Denmark and Portugal. Axelrod and Bennett's own landscape theory of 
aggregation does slightly better — its only misclassification is Portugal. 



respect it is a bit like problems in classical physics in- 
volving frictionlcss surfaces and massless springs; it is 
a mathematical cartoon of a single aspect of our social 
experience. It may give mechanistic insight but is not 
designed for quantitative prediction. 

A more specific objection might be raised regarding the 
divergence to infinity in finite time. However, since the 
purpose of the model is to study the pattern of signs 
that emerges, our main conclusion from the model is 
that the sign pattern eventually stabilizes at a point be- 
fore the divergence. This stabilization of the sign pat- 
tern is our primary focus, and one could interpret the 
subsequent singularity as simply the straightforward and 
unimpeded "ramping up" of values caused by the system 
once all inconsistencies have been worked out of the social 
relations — the divergence itself can be viewed as taking 
place beyond the window of time over which the system 
corresponds to anything real. Alternately, one can imag- 
ine that as the community completes its separation into 
two groups, other social processes take over. For exam- 
ple, individuals with differing ideological views or social 
preferences may self-segregate, breaking the all-to-all as- 
sumption of the model. In other cases, mounting tensions 
may erupt into violence, reflecting a sort of bound on the 
relationship intensity achievable for pairs of nodes in the 
network. 

Lastly we can ask, rather speculatively, whether the 
model provides any hints on how to guide divided com- 
munities toward reconciliation (in the cases where this is 
a sensible goal). The work presented here implies that 



the mean friendliness of the social network should be 
an important target for modulation. This suggests one 
potential strategy: (i) direct the attention of the social 
network away from its divided status, (ii) encourage the 
formation of friendships across the divide, and then (iii) 
bring the network back to the task of managing the issue 
that originally divided it, with the hope that the increase 
in mean friendliness will push the network toward the 
all-friends configuration. Remarkably Pettigrew, a social 
psychologist, has recently proposed a similar hypothesis 
with respect to overcoming prejudice, recommending the 
longitudinal process of (i) diverting attention away from 
ingroup-outgroup distinctions, (ii) allowing strong inter- 
group friendships to form, and then (iii) refocusing the 
community on social categorization until a single group 
category emerges [13, 14]. Considering the differences in 
discipline and methodology, the similarity between Pet- 
tigrew's sequence of steps and ours is striking, and the 
combined lesson is clear: given the right combination of 
diversion and bonding exercises, it may be possible to get 
a fractured social network to resolve its differences and 
begin to heal. 
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REFERENCED RESULTS 

In the introduction of our paper, we assert that a vari- 
ant of the dynamics proposed by Kuiakowski ct al. gener- 
ically balances an isolated triangle. We explain what we 
mean here. 

Theorem 1. The system in — X13X23, ±13 = ^12^23, 
±23 = X12X13 achieves balance when the initial values 
^12(0), £13(0) and £23(0) are all unequal. 

Proof. Multiplying each by Xij yields £122:12 = 
£13X13 = 2:232:23 ■ Integrating these equalities gives the 
constraints x\ 2 — ^13 = Ci and x\ 2 — x\\ = C2 which par- 
tition the three-dimensional space of (£12, 2:13, 2:23) into 
trajectories (with the direction of flow given by the orig- 
inal dynamical system) . Examination of this flow reveals 
that each initial condition (#12(0), 0:13(0), £23(0)) with 
distinct coordinates flows into one of the four octants on 
which Heidcr balance holds, that is where £12X132:23 > 0. 
Furthermore, these octants each act as separate trapping 
regions: once a trajectory enters, it cannot leave. Hence, 
the theorem follows. □ 

The next theorem regards the main system of the pa- 
per with a rescaling of time: -^X = n _1 X 2 , where X is 
a real symmetric nx n matrix. Recall that x^ (t) denotes 
the (i,j)th element of the solution matrix X(t) subject 
to the initial condition X(0). In the following, we will ab- 
breviate X(0) as A and Xij(0) as a^. Suppose that the 
a iji i < ji are drawn independently from distributions 
Fij with zero mass outside [-K, K], and the off-diagonal 
distributions Fij have common expectation fi and vari- 
ance a 2 . 

Theorem 2. lim^oo Xij(t) = a tj — /j, + /i/(l - fj,t) 
with probability 1 for t £ [0, l/K). 

Proof. Regard each step of the limit n — > 00 
as a selection and concatenation of elements 
{ain}i<i<n-iA a nj}i<j<n-i,a n n to the elements 
{ a ij}i<i,j '<n— 1 selected in preceding steps. Now consider 
the partial sum of the Taylor series expansion of x^ (t) : 

N 1 d k Xi- 

x i3 „N{t) = ^a kn t k where a kn = — ' J (1) 
k=o ' 4=0 

The first step of the proof of Theorem 2 consists of prov- 
ing that lini/v-s-oo hmn^oo Xij n pf(t) converges to a%j — /i + 



/i/(l — lit) with probability 1 on [0, l/|/x|) (see Lemma 
1). The second step of the proof consists of proving that 
hnijv^oo linin^oo x ijnN (t) = lim JWOO Xy (<) with prob- 
ability 1 on [0, l/K) (see Lemma 2). Since we can 
write Um„_j.oo a?y (t) as lim, woo limjv^oo Xi jnN (t), this 
amounts to showing that the two limits can be exchanged 
on [0, l/K). The above theorem then follows trivially by 
a union bound. □ 

Lemma 1. Under the assumptions of Theorem 2, 
liniAr^oo lim„^oo Xij n N = a>ij - /i + ^/(l — M*) with prob- 
ability 1 for t £ [0,l/\n\). 

Proof. For the sake of generality, we present a proof 
with more mild assumptions than those of the rest of 
the paper: we only require that the moments of the Fij 
distributions be finite (and off-diagonal distributions to 
have mean /it), not that the <Zjj values be bounded by K 
with probability 1. 

Define oikoc = li m n->oo Q-kn (merely shorthand — we do 
not assume the limit exists). By a union bound, we have 

00 00 

Pr(f) Koo = M fe+1 ]) > 1 - Et 1 - Pl "K- = 
fe=i fc=i 

(2) 

So if we can show that Vxia.^ = /i fc+1 ) = 1 for all 
k > 1, then Pr(f]'^L 1 [a^ = ^ fe+1 ]) = 1- In this case, 
lim7v->oo li m n^oo %ijnN has the convergent Taylor series 
ciij +2feLi l ik+l t k on [0, 1/ImI) with probability 1, which 
proves the lemma. 

So our task reduces to showing that Pr(aft 00 = /i fe+1 ) = 
1 for each k > 1. In order to do this, we need to com- 
pute the leading behavior of a k n in n. To calculate the k 
time derivatives of x^ in the formula for a kn (see Eq. 1), 
we alternate between applying the chain rule of differ- 
ential calculus and substituting in the right-hand side of 
iij = n~ x *Ylik x ikXkj (our system X = n~ 1 X 2 written in 
element-wise fashion). This gives 

n n n 

C^kn ^ ^ ^ ^ ' ' ' ^ ® J im\Q'm\m'2, ' ' ' ^ra^j (3) 
mi — 1 m2-l rrik — 1 

where the factor n~ k comes from the k factors of -nT 1 
introduced by the k derivatives, and the factor 1/kl in 
the formula for akn cancels with a factor fc! that arises 
from repeated applications of the chain rule. In Eq. 3, 
the dominant term is a sum of the edge value products 
of all simple length- (fc + 1) paths between i and j. This 
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sum contains (n — 2)!/(n — 2 — fc)! terms. All other paths 
include fewer immediate nodes and thus have at least a 
factor of n fewer terms in their sums. 

Our goal then for the remainder of the proof is to show 
that the first term of Eq. 3 is the only term that remains 
after taking n to infinity, and that it converges to fi k+1 
with probability 1. To simplify notation, let £ denote the 
product of the edge values a,j along a particular path of 
length k + 1 (not necessarily simple) from i to j, and let 
L denote the set of all such products on paths with the 
same configuration, or pattern of connectivity. Denote 
the set of all L by {L}, and let S denote the one L in 
{L} consisting of simple paths of length k + 1 from i to 

j- 

Now observe that ru[lim„_ i . 00 n~ k J2eeL ^ = 0] n [n~ k 
J2ees^ ~ ^ k+1 ] C [a/coo = /i fe+1 ]. So by another union 
bound, 

Pr(a feoo = A i fc+1 ) > 1-Prf lim n~ fe V £ ? [i k+ A 

v ees 7 

- V Prf lim ti- l T^0] (4) 

{L}\S V leL 7 

Hence, we are done if we can show that (i) 
Pr(lim, i _ yoo n~ k *£teS £ = V k+1 ) = 1 for S and (ii) 
Pr(lim IWOO n~ k J^eeL 1 = °) = 1 for a11 othcr L - A1 " 
though J2ieL ^ ^ s m g enera l a su m of correlated random 
variables, it is possible to adapt a standard proof of the 
strong law of large numbers for uncorrelated random vari- 
ables to prove both items. We do this next. 

Let's prove (ii) first. For brevity, let S„ = J2eeL ^ 
and choose v to denote the number of nodes in the 
path configuration of L. For any positive e and r = 
1,2,..., Markov's inequality gives Pr(|5„| > (ne) k ) < 
E(\S n \ r )/(ne) kr . So if we can find an r such that 
E(\Sn\ r )/(ne) kr < C/n 2 for some constant C (dependent 
on e), then X)nP r (l^l — ( ne ) k ) converges, and by the 
first Borel-Cantelli lemma, Pr(|n _fe J2 eeL ^1 — e = 
for all e > (where i.o. stands for infinitely often). 
Careful reflection reveals that U c [\n~ k J^eeL ^1 — 6 
(for, say, all rational e) is the complementary event of 
[limn^oo n~ k ^2 ieL I = 0], and so we have arrived at the 
desired result (ii). 

Hence, in order to actually show (ii), we need to find 
an r such that E{\S n \ r ) / {ne) kr < C/n 2 . Consider r = 2: 
E(S 2 ) — E(£ x £ y ), where each index of the sum ranges 
independently over L. There are (n— 2)!/(n— 2— v)\ paths 
£ in L, so there are fewer than n 2v terms in E(£ x £ y ), 
and E(S 2 ) < Dn 2v for some constant D. Since v < 
k for all L other than S, we have E{\S n \ 2 )/(ne) 2k < 
Dn 2v /(ne) 2k < C/n 2 where C = Der 2k , and the proof 
of (ii) is complete. 

Finally, to prove (i), start by replacing each factor a xy 
in £ with b xy + /i, where b xy = a xy — /i. Now expand 
the result and cancel /i fc+1 from both sides of n~ k S n = 



fi k+1 to obtain n~ k S' n — 0, where S' n is a sum over S of 
a polynomial Q with 2 k+1 — 1 terms, each of the form 
fifi ■ ■ ■ fJ-b uv b wx ■ ■ ■ b yz where at least one of the factors is 
a b xy and the total number of factors in the term is k + 1 . 
Note that each place of Q corresponds to a particular set 
of b xy 's from the original simple path, e.g. the 14th place 
of Q might have 6 XI/ 's corresponding to the 1st, 4th, 5th, 
and 7th edges of the path, and /i's for the other edges. 
Now let m q denote the number of vertices (excluding i 
and j) among the subscripts of the b X yS in a given term. 
The remaining k — m q nodes of the path not found in 
the term (supplanted by the fi's) can take any of (n — 
2 — m q )l/ (n — 2 — fc)! permutations. Hence, there are no 
more than n k ~ rn i identical copies of any one term in S' n 
from the same place in Q. 

Now consider one of the (2 fc+1 — l) 4 ways that terms in 
the 2 k+1 — 1 places of Q can be multiplied together in S^ 4 . 
Note that this can produce no more than n 4fc_ ^9= 1 mq 
identical copies of the same term. Second, since the b X yS 
each have expectation zero, every b xy in the final term 
must appear to at least a power of two or the whole term 
has expectation zero. This implies that for each nonva- 
nishing term, there must be some pattern of matching 
between the b X yS. The number of possible matchings 
is clearly a function of k and not n (it certainly is not 
more than the number of partitions of A(k + 1) edges), 
so consider one of these possible matchings. Now ob- 
serve that if, as we stated above, we consider only one of 
the (2 k+1 — l) 4 ways that terms in the 2 k+1 — 1 places 
of Q can be multiplied together in 5„ , then no more 
than n^-'i= 1 m <j/ 2 distinct nonvanishing terms can be con- 
structed per matching for any such way of combining 
terms. This holds because each b xy needs at least one 
match, and so the number of free nodes cannot exceed 
half the total number of b X yS in the final term. Thus we 
have shown the highest order of n possible for E(S'^) is 
given by the maximum value of n^<i=i m i/ 2 n ik -Y. q=l m q _ 
Since m q > 1 for each q = 1, . . . , 4, this can at most be 
?i 4fc ~ 2 , which by the above reasoning completes the proof 
of (i) and hence the full theorem. □ 

Lemma 2. Under the assumptions of Theorem 2, 
lim^oo limAr^oo Xij n N = limjv^oo hin^oo x l]n N with 
probability 1 for t G [0, 1/K). 

Proof. We need three ingredients for this proof. We 
will first describe the three ingredients and then show 
how they together prove Lemma 2. Throughout the 
following, all statements hold with probability 1 unless 
stated otherwise. 

As we found in the course of the proof of Lemma 1, 
the limits limn^oo ctkn exist for all k and are /j, k+1 on 
[0, 1/|m|)) so linTn-yoo SfcLo a knt k exists under the same 
conditions, and we call it XijoaN(t). This gives us the 
first ingredient: (i) lim^oo x ijnN (t) = x ijooN (t) for 
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t € [0, 1/| /■*!) and any N. Additionally, from Lemma 
1 we know that lim.jv_j.oo Xijcoiv(f) exists and is ay — /i + 
fj,/ (1 — fit) on [0, 1/|m|). We call this limit x i:)0000 (t), and 



r(t) 



write the second ingredient as (ii) lim 

(t) forte [0,1/H). 

Finally, as we saw in the proof of Lemma 1, akn 



imi u, mi7U2 



(by definition, not just with 
probability 1), where the fc indices m x each range in- 
dependently from 1 to n. Since each |ay| < A", we 



must have that |afc n | < K k+1 , which implies 



.(*) 



XijnN(t)\ < K{Kt) N+1 /{\ - At). So if \Kt\ < 1, then 
for any e > 0, there is a sufficiently large Ni indepen- 
dent of n such that \xij noo (t) — %ijnN{t)\ < e for all 
N > N±. This constitutes our third ingredient, that 
XijnN(t) converges uniformly to Xij noo (t): (iii) for ev- 
ery e > 0, there exists an N\ such that for all N > N% 
and all n, \x. 



■ijnoo 



(t) - x ijnN (t)\ < e. 



To complete the proof of Lemma 2, we need to show 
that lim„_ ! . 00 Xy noo (t) exists and is just _ijoooo (i) on 
[0,1/A). Start by picking an e > 0. Then by (iii), 
there exists an JVi such that if N > Ai then |a;jj no o(t) — 
Xij n N {t) | < e for all n. Similarly, (ii) implies that there 
exists an A2 such that if N > N%, then l^joooo {t) ~ 
XijooN(t)\ < e- Finally, let A 3 = max{Ai,A 2 }. Then 
by (i), we may choose an n\ such that if n > m, then 
\xijooN 3 (t) — Xij n N 3 (t)\ < e. Now define the following 
events: 



El = [\Xijooaoit) - XijooN 3 (t)\ < e] 

A 2 = 

A3 = 

J__4 [|^?joooo(^) %ijnoc(js)\ ^ 3c] 



(*)l < £ ] 

[ I ijnNs (^) ^ijnoo(^)l ^ ^] 



(5) 



Observe that, in similar form to Eq. 4, (Ei n A2 R A3) C 
A 4 , so Pr(A 4 ) > Pr(Ai n A 2 n A 3 ) = 1 - Pr(A( U E' 2 U 
A 3 ) > 1 - Pr(AQ - Pr(A 2 ) - Pr(AQ = 1 for all n>m. 



Thus, |:%oooo(£) 

t e [0,1/A). □ 



< 3e for all n > Hi and 



Theorem 3. Pr(Ai < 0) is exponentially small in n. 

Proof. If Ai < 0, then the corresponding matrix A 
is negative semi-definite. Therefore v T Av < for every 
vector v (T denotes transposition). Let Vk denote the 
vector with +1 and —1 in its (2k — l)th and 2fcth compo- 
nents, respectively, and in all other components. Then 
the event that v^Avk < is equivalent to the event that 

0(2A-l)(2fc-l) - a (2fe-l)2fe - a 2k{2k-l) + &2k2k < 0. Note 

that the left-hand side of this final inequality has at least 
a constant probability of being positive. Now as k ranges 
from 1 to n/2, we encounter n/2 independent events, each 
having at least a constant probability of failure. Hence 
the probability that A is negative semi-definite is expo- 
nentially small in n. □ 

Very roughly, the final result of this supporting text 
says that in the case of negative fi, the distribution of 
the sum of the components of uii = (vi, ■ ■ ■ ,v n ) retains 
no more than constant width in the large-n limit. So, for 
example, the mean of the components of uii must shrink 
to zero at least as fast as 1/n. (Note that this conver- 
gence is faster than the 1 / y/n convergence of means for 
independent and identically distributed random variables 
with finite mean and variance.) This result is consistent 
with the picture that when fi is negative, the system is 
destined for two-sided conflict in the large-n limit. 

To make the proof less cumbersome, suppose that the 
diagonal Fu have common expectation \jl and variance a 2 
just like the off-diagonal Ay. 



Theorem 4. If fj, < 0, then lim r; 
££=i «il < e ) = 1 for a11 S,e>0. 
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Proof. Consider the eigenvalue equation 

Yl^j—i (HjVj — Ai_j. Sum both sides over i, and 
then rearrange and switch index labels to obtain 



(6) 



UNREFERENCED RESULTS 

In the main report, we establish that Ai > in prob- 
ability by way of Wigner's semicircle law. However, we 
can also show that Ai > with high probability under 
a different set of assumptions about how A is selected. 
Loosely speaking, the selection of the diagonal entries is 
more constrained in this alternative approach while the 
selection of the off-diagonal entries is less so. 

Suppose that all ay, \i — j\ = 1, are chosen from the 
same distribution A and all chosen from the same 

distribution G. All selections are independent for i < j. 
In addition, assume that the density of an — avx is not 
entirely confined to negative values. Then we have the 
following theorem. 



We anticipate from standard results of random matrix 
theory that the left side of Eq. 6 is asymptotic to 
fin £j_j Vi and the right side to 2<j\Jn'Y™ i=x Uj. So let's 
define the two events 



Ei 



n n 



J n 6 
i=l .7=1 i=l 

Ai v ^ 2cT % I 
— > Vi - —= > Vi < e 
n ^— ' Jn t-^ 1 

i=l v i=l 



E Ui l - e 



(7) 



where [•] denotes an event. Now, Pr(Ai n A 2 ) = 1 — 
Pr(E[ U E' 2 ) > 1 - Pr(Aj ) - Pr(A 2 ), where we have writ- 
ten E' x for the complementary event of E x . So if we can 
show that lim„_>. 00 Pr(Ai) = 1 and lim„_ >00 Pr(A 2 ) = 1 
for all 8, e > 0, it follows that lim„_ ! . 00 Pr(\/m~ s £™ =1 Vi — 
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1, which implies The- 



2an- 1 /2-5£» =iUi | <e + en- 
orem 4. 

Hence our task reduces to showing that 
lim^oo Pr(J5i) = lim^oo Pr(-E 2 ) = 1- First con- 
sider Pt(Ei). By Jensen's inequality and normalization 

of Wl , (n- 1 YT l= i l«i|) 2 < n -1 ELi v i = n ~ X i so 



i=l i=l 

Now define the additional event: 



E» = 



max 

Ki<n 



-1/2-5 



53 (oy - m) 



< e 



(8) 



(9) 



By Eq. 8, we have £3 C [maxi<i<„ \n 1/2 5 E™=i( a ij" ~ 

M)K 1/2 Er=iKI < £ ] c [n-^^E^iKIIE^iK - 

m)I ^ e ] C £1. The Bernstein inequality and a union 
bound together imply that lim^oo Pi(Es) = 1, since 
they give that 



Pr(E' 3 ) < 2nexp - 



n 25 e 2 /2 



a 2 + (K + //)n- 1 /2+<5 e /3 



(10) 



So, lim n _>oo Pr(£'i) = 1 as desired. 

Regarding Pr(i? 2 ), wc know that Ai G 2dY / n+o(Y / n) in 
probability [1], which implies that lim^oo Pr(\Xi/y/n — 
2a\ < e) = 1 and therefore limn^oo Pr(£ , 2) = 1 for all 
e > by Eq. 8. This completes the proof. □ 
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